ci: lint/type/test floor + dataset & eval-baseline gates (Milestone A) by helebest · Pull Request #5 · OpenDIKW/dikw-data

helebest · 2026-06-25T14:29:57Z

What

Closes the CI/CD gap in dikw-data so it has the same deterministic floor as dikw-core, before Phase 1 dataset construction begins. This is Milestone A of the eval-plan Phase 0→1 work (see docs/dikw-eval-plan.md); the public-anchor calibration (scifact/cmteb) follows as Milestone B.

Changes

pyproject.toml — ruff + mypy (strict) config mirroring dikw-core. Ignores RUF001/2/3 (ambiguous-unicode false positives on this bilingual zh/en codebase's embedded CJK text) and scope-ignores E702 to the one procedural-pictogram generator.
.pre-commit-config.yaml — local ruff + mypy hooks (uv run), matching CI. Install with uv run pre-commit install.
.github/workflows/ci.yml — uv sync → ruff → mypy src → pytest → validate every dataset (shape gate, $0, no provider keys). Matrix 3.12 / 3.13.
.github/workflows/eval-gate.yml + tools/check_baselines.py — a datasets/** change must land a new dated reports/BASELINES.md entry naming a retrieval metric; override with the no-baseline-needed label. The pure check is unit-tested (tests/test_check_baselines.py).
.gitignore — track reports/BASELINES.md (the baseline log) while keeping per-run artifacts ignored; ignore .impeccable/.
Applied ruff autofixes (import sorting, unused-import / whitespace cleanup) across scripts/, src/, web/, tests/ so the existing tree passes the new gate. Fixed 3 real lint findings (RUF005 in run_eval.py, dead code in generate_queries_local.py) and 2 mypy findings (unused ignore; int**int Any-widening in llm_client.py).

Verification (local)

uv run ruff check . → clean
uv run mypy src → clean
uv run pytest → 50 passed
scripts/validate_dataset.py over all 3 datasets → valid
eval-gate plumbing tested end-to-end: non-dataset change → no-op pass; dataset change without a baseline entry → fail (exit 1); with a proper entry → pass.

🤖 Generated with Claude Code

Close the CI/CD gap so dikw-data has the same deterministic floor as dikw-core before Phase 1 dataset construction begins. - pyproject: ruff + mypy (strict) config, mirroring dikw-core; ignore RUF001/2/3 (false positives on this bilingual zh/en codebase's embedded CJK text) and scope-ignore E702 to the one procedural-pictogram generator. - .pre-commit-config.yaml: local ruff + mypy hooks (uv run), matching CI. - .github/workflows/ci.yml: uv sync -> ruff -> mypy src -> pytest -> validate every dataset (shape gate, $0, no provider keys). Matrix 3.12/3.13. - .github/workflows/eval-gate.yml + tools/check_baselines.py: a dataset change (datasets/**) must land a new dated reports/BASELINES.md entry naming a retrieval metric; override with the `no-baseline-needed` label. Unit-tested. - .gitignore: track reports/BASELINES.md (the baseline log) while keeping per-run artifacts ignored; ignore .impeccable/. - Apply ruff autofixes (import sorting, unused-import / whitespace cleanup) across scripts/, src/, web/, tests/ so the existing tree passes the new gate. Fix 3 real lint findings (RUF005 in run_eval.py, dead code in generate_queries_local) and 2 mypy findings (unused ignore; int**int Any-widening in llm_client). All green locally: ruff clean, mypy clean, 50 tests pass, 3 datasets validate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

This was referenced Jun 25, 2026

eval: record scifact + cmteb public-anchor calibration (Milestone B) #6

Closed

feat(datasets): Phase 1 in-house sets — domain-bilingual-v1 + negatives-ood-v1 #7

Merged

helebest force-pushed the ci/lint-and-eval-gates branch from 72527bd to 6d62374 Compare June 28, 2026 12:57

helebest merged commit 6639f08 into main Jun 28, 2026
2 checks passed

helebest deleted the ci/lint-and-eval-gates branch June 28, 2026 12:57

helebest mentioned this pull request Jun 28, 2026

eval: record scifact + cmteb public-anchor calibration (Milestone B) #9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: lint/type/test floor + dataset & eval-baseline gates (Milestone A)#5

ci: lint/type/test floor + dataset & eval-baseline gates (Milestone A)#5
helebest merged 1 commit into
mainfrom
ci/lint-and-eval-gates

helebest commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

helebest commented Jun 25, 2026

What

Changes

Verification (local)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant